Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model

نویسندگان

Bo Li

Tara N. Sainath

Khe Chai Sim

Michiel Bacchiani

Eugene Weinstein

Patrick Nguyen

Zhifeng Chen

Yonghui Wu

Kanishka Rao

چکیده

Sequence-to-sequence models provide a simple and elegant solution for building speech recognition systems by folding separate components of a typical system, namely acoustic (AM), pronunciation (PM) and language (LM) models into a single neural network. In this work, we look at one such sequence-to-sequence model, namely listen, attend and spell (LAS) [1], and explore the possibility of training a single model to serve different English dialects, which simplifies the process of training multi-dialect systems without the need for separate AM, PM and LMs for each dialect. We show that simply pooling the data from all dialects into one LAS model falls behind the performance of a model fine-tuned on each dialect. We then look at incorporating dialect-specific information into the model, both by modifying the training targets by inserting the dialect symbol at the end of the original grapheme sequence and also feeding a 1-hot representation of the dialect information into all layers of the model. Experimental results on seven English dialects show that our proposed system is effective in modeling dialect variations within a single LAS model, outperforming a LAS model trained individually on each of the seven dialects by 3.1~16.5% relative.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Seismic Data Forecasting: A Sequence Prediction or a Sequence Recognition Task

In this paper, we have tried to predict earthquake events in a cluster of seismic data on pacific ring of fire, using multivariate adaptive regression splines (MARS). The model is employed as either a predictor for a sequence prediction task, or a binary classifier for a sequence recognition problem, which could alternatively help to predict an event. Here, we explain that sequence prediction/r...

متن کامل

Reducing out-of-vocabulary in morphology to improve the accuracy in Arabic dialects speech recognition

This thesis has two aims: developing resources for Arabic dialects and improving the speech recognition of Arabic dialects. Two important components are considered: Pronunciation Dictionary (PD) and Language Model (LM). Six parts are involved, which relate to finding and evaluating dialects resources and improving the performance of systems for the speech recognition of dialects. Three resource...

متن کامل

A fuzzy multi-objective linear programming approach for solving a new multi-objective job shop scheduling with sequence-dependent setup times

This paper presents a new mathematical model for a bi-objective job shop scheduling problem with sequence-dependent setup times that minimizes the weighted mean completion time and the weighted mean tardiness time. For solving this multi-objective model, we develop a fuzzy multi-objective linear programming (FMOLP) model. In this problem, a proposed FMOLP method is applied with respect to the o...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1712.01541 شماره

صفحات -

تاریخ انتشار 2017

Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model

نویسندگان

چکیده

منابع مشابه

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Seismic Data Forecasting: A Sequence Prediction or a Sequence Recognition Task

Reducing out-of-vocabulary in morphology to improve the accuracy in Arabic dialects speech recognition

A fuzzy multi-objective linear programming approach for solving a new multi-objective job shop scheduling with sequence-dependent setup times

عنوان ژورنال:

اشتراک گذاری